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Abstract 

The unceasing demand for continuous situational awareness calls for innovative and large-scale signal 
processing algorithms, complemented by collaborative and adaptive sensing platforms to accomplish the 
objectives of layered sensing and control. Towards this goal, the present paper develops a spline-based 
approach to field estimation, which relies on a basis expansion model of the field of interest. The model 
entails known bases, weighted by generic functions estimated from the field's noisy samples. A novel field 
estimator is developed based on a regularized variational least-squares (LS) criterion that yields finitely- 
parameterized (function) estimates spanned by thin-plate splines. Robustness considerations motivate well 
the adoption of an overcomplete set of (possibly overlapping) basis functions, while a sparsifying regularizer 
augmenting the LS cost endows the estimator with the ability to select a few of these bases that "better" 
explain the data. This parsimonious field representation becomes possible, because the sparsity-aware spline- 
based method of this paper induces a group-Lasso estimator for the coefficients of the thin-plate spline 
expansions per basis. A distributed algorithm is also developed to obtain the group-Lasso estimator using 
a network of wireless sensors, or, using multiple processors to balance the load of a single computational 
unit. The novel spline-based approach is motivated by a spectrum cartography application, in which 
a set of sensing cognitive radios collaborate to estimate the distribution of RF power in space and 
frequency. Simulated tests corroborate that the estimated power spectrum density atlas yields the desired 
RF state awareness, since the maps reveal spatial locations where idle frequency bands can be reused for 
transmission, even when fading and shadowing effects are pronounced. 
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I. Introduction 



Well-appreciated as a tool for field estimation, thin-plate (smoothing) splines find application in areas as 
diverse as climatology [27], image processing [9], and neurophysiology [21]. Spline-based field estimation 
involves approximating a deterministic map g : W 1 — > R from a finite number of its noisy data samples, by 
minimizing a variational least-squares (LS) criterion regularized with a smoothness-controlling functional. In 
the dilemma of trusting a model versus trusting the data, splines favor the latter since only a mild regularity 
condition is imposed on the derivatives of g, which is otherwise treated as a generic function. While this 
generality is inherent to the variational formulation, the smoothness penalty renders the estimated map 
unique and finitely parameterized [10, p. 85], [26, p. 31]. With the variational problem solution expressible 
by polynomials and specific kernels, the aforementioned map approximation task reduces to a parameter 
estimation problem. Consequently, thin-plate splines operate as a reproducing kernel Hilbert space (RKHS) 
learning machine in a suitably defined (Sobolev) space [26, p. 34]. 

Although splines emerge as variational LS estimators of deterministic fields, they are also connected to 
classes of estimators for random fields. The first class assumes that estimators are linearly related to the 
measured samples, while the second one assumes that fields are Gaussian distributed. The first corresponds 
to the Kriging method while the second to the Gaussian process model; but in both cases one deals with 
a best linear unbiased estimator (BLUE) [24]. Typically, wide sense stationarity is assumed for the field's 
spatial correlation needed to form the BLUE. The so-termed generalized covariance model adds a parametric 
nonstationary term comprising known functions specified a priori [17]. Inspection of the BLUE reveals 
that if the nonstationary part is selected to comprise polynomials, and the spatial correlation is chosen to 
be the splines kernel, then the Kriging, Gaussian process, and spline-based estimators coincide [26, p. 35]. 

Bearing in mind this unifying treatment of deterministic and random fields, the main subjects of this 
paper are spline-based estimation, and the practically motivated sparse (and thus parsimonious) description 
of the wanted field. Toward these goals, the following basis expansion model (BEM) is adopted for the 
target map 



with x G R 2 , / £ R, and the L 2 —norms {||^(/)||l 2 — l}^i normalized to unity. 

The bases \pv(f)} v L\ are preselected, and the functions ^(x) are to be estimated based on noisy 
samples of This way, the model-versus-data balance is calibrated by introducing a priori knowledge on 
the dependence of the map $ with respect to (w.r.t.) variable /, or more generally a group of variables, 
while trusting the data to dictate the functions g v {yi) of the remaining variables x. 




(1) 
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Consider selecting Nf, basis functions using the basis pursuit approach [8], which entails an extensive 
set of bases thus rendering JV& overly large and the model overcomplete. This motivates augmenting the 
variational LS problem with a suitable sparsity-encouraging penalty, which endows the map estimator with 
the ability to discard factors g v (x)b u {f) in (1), only keeping a few bases that "better" explain the data. 
This attribute is inherited because the novel sparsity-aware spline-based method of this paper induces a 
group-Lasso estimator for the coefficients of the optimal finitely-parameterized g v . Group-Lasso estimators 
are known to set groups of weak coefficients to zero (here the Nf, groups associated with coefficients per 
g v ), and outperform the sparsity-agnostic LS estimator by capitalizing on the sparsity present [29], [22]. An 
iterative group-Lasso algorithm is developed that yields closed-form estimates per iteration. A distributed 
version of this algorithm is also introduced for data collected by cooperating sensors, or, for computational 
load-balancing of multiprocessor architectures. A related approach to model selection in nonparametric 
regression is the component selection and smoothing operator (COSSO) [16]. Different from the approach 
followed here, COSSO is limited to smoothing-spline, analysis-of-variance models, where the target function 
is assumed to be expressible by a superposition of orthogonal component functions. Compared to the single 
group-Lasso estimate here, COSSO entails an iterative algorithm, which alternates through a sequence of 
smoothing spline [13, p. 151] and nonnegative garrote [7] subproblems. 

The motivation behind the BEM in (1) comes from our interest in spectrum cartography for wireless 
cognitive radio (CR) networks, a sensing application that serves as an illustrating paradigm throughout the 
paper. CR technology holds great promise to address fruitfully the perceived dilemma of bandwidth under- 
utilization versus spectrum scarcity, which has rendered fixed-access communication networks inefficient. 
Sensing the ambient interference spectrum is of paramount importance to the operation of CR networks, 
since it enables spatial frequency reuse and allows for dynamic spectrum allocation; see, e.g., [11], [19] 
and references therein. Collaboration among CRs can markedly improve the sensing performance [23], 
and is key to revealing opportunities for spatial frequency reuse [20]. Pertinent existing approaches have 
mostly relied on detecting spectrum occupancy per radio, and do not account for spatio-temporal changes 
in the radio frequency (RF) ambiance, especially at intended receiver(s) which may reside several hops 
away from the sensed area. 

The impact of this paper's novel field estimators to CR networks is a collaborative sensing scheme 
whereby receiving CRs cooperate to estimate the distribution of power in space x and frequency /, namely 
the power spectrum density (PSD) map ^(x, /) in (1), from local periodogram measurements. The estimator 
need not be extremely accurate, but precise enough to identify spectrum holes. This justifies adopting the 
known bases to capture the PSD frequency dependence in (1). As far as the spatial dependence is concerned, 
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the model must account for path loss, fading, mobility, and shadowing effects, all of which vary with the 
propagation medium. For this reason, it is prudent to let the data dictate the spatial component of (1). 
Knowing the spectrum at any location allows remote CRs to reuse dynamically idle bands. It also enables 
CRs to adapt their transmit-power so as to minimally interfere with licensed transmitters. The spline-based 
PSD map here provides an alternative to [4], where known bases are used both in space and frequency. 
Different from [1] and [4], the field estimator here does not presume a spatial covariance model or pathloss 
channel model. Moreover, it captures general propagation characteristics including both shadowing and 
fading; see also [15]. 

Notation: Bold uppercase letters will denote matrices, whereas bold lowercase letters will stand for column 
vectors. Operators ®, (.)', tr(.), rank(.), bdiag(.), E[-] will denote Kronecker product, transposition, matrix 
trace, rank, block diagonal matrix and expectation, respectively; |.| will be used for the cardinality of a set, 
and the magnitude of a scalar. The L2 norm of function b : E — > E is ||&|| 2 := b 2 (f)df, while the 
£ p norm of vector x G W is ||x|| p := (Yh=i \ x i\ p ) 1/P for V > 1; and ||M|| F := -y/tr (MM') is the matrix 
Frobenious norm. Positive definite matrices will be denoted by M y 0. The p x p identity matrix will be 
represented by I p , while P will denote the p x 1 vector of all zeros, and pxq '■= P 0' . The i-th vector 
in the canonical basis for W will be denoted by e p j, i = 1, . . . ,p. 

II. BEM for Spectrum Cartography 

Consider a set of N s sources transmitting signals using portions of the overall bandwidth B. 

The objective of revealing which of these portions (sub-bands) are available for new systems to transmit, 
suggests that the PSD estimate sought does not need to be super accurate. This motivates modeling the 
transmit-PSD of each u s (t) as 

N b 

<f> s (f) = J26sMf), s = l,...,N s (2) 

where the basis b u (f) is centered at frequency f u , u = 1, . . . , iVj,. The example depicted in Fig. 1 involves 
(generally overlapping) raised cosine bases with support B v = [f v — (1 + p) /2T S , f v + (1 + p)/2T s ], where 
T s is the symbol period, and p stands for the roll-off factor. Such bases can model transmit-spectra of 
e.g., multicarrier systems. In other situations, power spectral masks may dictate sharp transitions between 
contiguous sub-bands, cases in which non-overlapping rectangular bases may be more appropriate. All in 
all, the set of bases should be selected to accommodate a priori knowledge about the PSD. 

The power transmitted by source s will propagate to the location x G R 2 according to a generally 
unknown spatial loss function Z s (x) : M 2 — > M. The propagation model Z s (x) not only captures frequency- 
flat deterministic pathloss, but also stationary, block-fading and even frequency-selective Rayleigh channel 
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effects, since their statistical moments do not depend on the frequency variable. In this case, the following 
vanishing memory assumption is required on the transmitted signals for the spatial receive-PSD $(x, /) 
to be factorizable as Z s (x)<fr s (/); see [4] for further details. 

(as) Sources are stationary, mutually uncorrelated, independent of the channels, and have 

vanishing correlation per coherence interval; i.e., r ss (r) := E[u s {t + T)u s (t)] = 0, V |r| > T c — L, where 
T c and L represent the coherence interval and delay spread of the channels, respectively. 

Under (as), the contribution of source s to the PSD at point x is Z s (x) J2u=i Qsubv{f)\ and the PSD due 
to all sources received at x will be given by $(x, /) = Yl s =i ^s( x ) E v =i 9svb v (f). Such a model can be 
simplified by defining the function ^(x) := Yls=i ^s^s( x )- With this definition and upon exchanging the 
order of summation, the spatial PSD model takes the form in (1), where functions {gv(^-)}^=\ are to be 
estimated. They represent the aggregate distribution of power across space corresponding to the frequencies 
spanned by the bases {b v }. Observe that the sources are not explicitly present in (1). Even if this model 
could have been postulated directly for the cartography task at hand, the previous discussion justifies the 
factorization of the ^(x, /) map per band in factors depending on each of the variables x and /. 



The sensing strategy will rely on the periodogram estimate c/> rn (r) at a set of receiving (sampling) 
locations X := {x r }^: 1 G M?, frequencies T := {f n }n=i e ^> an ^ time-slots {r}^ =1 . In order to 
reduce the periodogram variance and mitigate fading effects, </> rn (r) is averaged across a window of T 
time-slots [4], to obtain 



Hence, the envisioned setup consists of N r receiving CRs, which collaborate to construct the PSD map 
based on PSD observations {ip rn }- The bulk of processing is performed centrally at a fusion center (FC), 
which is assumed to know the position vectors X of all CRs, and the sensed tones in F. The FC receives 
over a dedicated control channel, the vector of samples ip r := [yvi, • • • ,<PrN]' £ 1^ taken by node r for 
all r = 1, . . .,N r . 

While a BEM could be introduced for the spatial loss function Z s (x) as well [4], the uncertainty on the 
source locations and obstructions in the propagation medium may render such a model imprecise. This 
will happen, e.g., when shadowing is present. The alternative approach followed here relies on estimating 
the functions ^(x) based on the data {(p rn }- To capture the smooth portions of 3>(x, /), the criterion for 
selecting ^(x) will be regularized using a so termed thin-plate penalty [26, p. 30]. This penalty extends to 



III. Cooperative Spline-Based PSD Field Estimation 




(3) 



T=l 
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M 2 the one-dimensional roughness regularization used in smoothing spline models. Accordingly, functions 
{g v }v=i are estimated as 

1 N r n / N b \ 2 N b 

{9»}uU :=argmin j^tJ2Y1 U>™ - X)^( x r)M/n) + A Zl/ H V V(x)||^x (4) 

^' ye r r=ln=l \ y=l / u=l jR2 

where ||V 2 <7„||f denotes the Frobenius norm of the Hessian of g v . 

The optimization is over S, the space of Sobolev functions, for which the penalty is well defined [10, 
p. 85]. The parameter A > controls the degree of smoothing. Specifically, for A = the estimates in (4) 
correspond to rough functions interpolating the data; while as A — > oo the estimates yield linear functions 
(cf. V 2 <y„(x) = 02x2)- A smoothing parameter in between these limiting values will be selected using a 
leave-one-out cross-validation (CV) approach, as discussed later. 

A. Thin-plate splines solution 

The optimization problem (4) is variational in nature, and in principle requires searching over the infinite- 
dimensional functional space S. It turns out that (4) admits closed-form, finite dimensional minimizers 
(^(x), as presented in the following proposition which provides a generalization of standard thin-plate 
splines results; see e.g., [26, p.31], to the multi-dimensional BEM (1). 

Proposition 1: The estimates {g u }^=i m (4) ore thin-plate splines expressible in closed form as 

N r 

g v {yi) = y^/^if (||x - x r || 2 ) + a' ul x. + a u0 (5) 

r=l 

where K{p) := p 2 log(p), and [3 U := \p v \, ■ ■ ■ , PvN r ]' is constrained to the linear subspace B := {/3 £ 

^ Nr ■ Er=l Pr = 0, Er=l Pr*r = 2 , X r G X} for V = 1, . . . , N b . 

The proof of this proposition is given in Appendix A. 
Remark 1 (Overlapping frequency basis). If the basis functions {b u (f)} have finite supports which 
do not overlap, then (4) decouples per g v , and thus the results in [10], [26] can be applied directly. The 
novelty of Proposition 1 is that the basis functions with spatial spline coefficients in (1) are allowed to be 
overlapping. The implication of Proposition 1 is finite parametrization of the PSD map [cf. (5)]. This is 
particularly important for non-FDMA based CR networks. In the forthcoming Section IV, an overcomplete 
set {b v } is adopted in (1), and overlapping bases naturally arise therein. 

What is left to determine are the parameters a := [aio, ct'^, . . . , o^o? iY e M. 3Nb , and (3 := 
[f3[, . . . ,f3' N ]' £ M. NrNb in (5). To this end, define the vector ip := [tpn, . . . , (piw, ■ ■ ■ , <fN T ii ■ <PN r N]' £ 
M. NrN containing the network-wide data obtained at all frequencies in J- ' . Three matrices are also introduced 
collecting the regression inputs: i) T G M ArrX3 with rth row t' r := [1 x! r ] for r = 1, . . . , N r and x r G X ; ii) 
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B G R NxN " with nth row := [&i(/„), . . . , b Nb (f n )] for n = 1, . . . , N; and iii) K G R^-xtf- with ij-th 
entry [K]jj := ET(| |xj — Xj| |) for Xj, G X. Consider also the QR decompositions of T = [Qi Q2] [R' 0]' 
and B = [Oi fl 2 ] [T' 0]'. 

Upon plugging (5) into (4), it is shown in Appendix B that the optimal {a, (3} satisfy the following 
system of equations 

(B ® Q' 2 )cp = [(B'B ® Q' 2 KQ 2 ) + N r NXI Nb(Nr _ 3) ] 7 (6) 

[r ® R]a = (Oi Qi)v " (r <S> QiKQ 2 )7 (V) 

$ = (I Nb ® Q 2 ) 7 - (8) 

Matrix Q 2 KQ 2 is positive definite, and rank(T ® R) = rank(r)rank(R); see e.g., [18]. It thus follows 
that (6)-(7) admit a unique solution if and only if T and R are invertible (correspondingly, B and T 
have full column rank). These conditions place practical constraints that should be taken into account 
at the system design stage. Specifically, T has full column rank if and only if the points in X, i.e., 
the CR locations, are not aligned. Furthermore, B will have linearly independent columns provided the 
basis functions {b u (f)}^l 1 comprise a linearly independent and complete set, i.e., B C \J v B u . Note that 
completeness precludes all frequencies {f n }n=i from falling outside the aggregate support of the basis set, 
hence preventing undesired all-zero columns in B. 

Remark 2 (Practicality of uniqueness conditions). The condition on X does not introduce an actual 
limitation as it can be easily satisfied in practice, especially when the CRs are randomly deployed. Likewise, 
the basis set is part of the system design, and can be chosen to satisfy the conditions on B. Nonetheless, 
these conditions will be bypassed in Section IV by allowing for an overcomplete set of functions {b u }. 

The combined results in this section can be summarized in the following steps constituting the spline- 
based spectrum cartography algorithm, which amounts to estimating $(x, /): 

51. Given (p, solve (6)-(8) for a, (3, after selecting A as detailed in Appendix D. 

52. Substitute a and into (5) to obtain {ff I /(x)}^. 1 . 

53. Use {^(x)}^ in (1) to estimate $(x, /). 

B. PSD tracker 

The real-time requirements on the sensing radios and the convenience of an estimator that adapts to 
changes in the spectrum map are the motivating reasons behind the PSD tracker introduced in this section. 
The spectrum map estimator will be henceforth denoted by <&(x, /, r), to make its time dependence explicit. 
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Define the vector 4> n (r) := [0i n (r), . . . ,<pN r n( T )Y of periodogram samples taken at frequency f n by 
all CRs, and form the supervector $(t) := [0i(r), . . . , $' N (r)]' £ R NrN . Per time-slot r = 1, 2, . . ., the 
periodogram <p(r) is averaged using the following adaptive counterpart of (3): 

T 

<p(r) := S t ~ t '$(t') = 6<p(r " 1) + 0(r) (9) 

t'=1 

which implements an exponentially weighted moving average operation with forgetting factor 6 £ (0, 1). 
For every r, the online estimator 3>(x, /, r) is obtained by plugging in (1) the solution {^(x, r)}^. 1 
of (4), after replacing tp rn with <p rn (T) [cf. the entries of the vector in (9)]. In addition to mitigating 
fading effects, this adaptive approach can track slowly time-varying PSDs because the averaging in (9) 
exponentially discards past data. 

Suppose that per time-slot r, the FC receives raw periodogram samples 4>{t) from the CRs in order 
to update <J?(x, /, r). The results of Section III apply for every r, meaning that {<?„(x, r)}^. 1 are given 
by (5), while the optimum coefficients {d(r), /3(r)} are found after solving (6)-(8). Capitalizing on (9), 
straightforward manipulations of (6)-(8) show that {d(r),/3(r)} are recursively given for all r > 1 by 

$(t) = 5$(t - 1) + (Ijv s (8) Q 2 )Gi0(r) (10) 
d(r) = 5a(r - 1) + G 2 0(r) (11) 

where the time-invariant matrices Gi and G 2 are 

d := [(B'B ® Q' 2 KQ 2 ) + N r NXI Nb(Nr _ 3) ] ~ l (B ® Q' 2 ) 

g 2 : = [r ® R]" 1 [(ni ® Qi) - (r ® q' 1 kq 2 )Gi] . 

Recursions (10)-(11) provide a means to update $(x, /, r) sequentially in time, by incorporating the newly 
acquired data from the CRs in 4>{t). There is no need to separately update tp(r) as in (9), yet the desired 
averaging takes place. Furthermore, matrices Gi and G 2 need to be computed only once, during the startup 
phase of the network. 

IV. Group-Lasso on Splines 
An improved spline-based PSD estimator is developed in this section to fit the unknown spatial functions 
{OiAyLi i n th e model ^(x, /) = Y^v=i fff ( x )^(i0> w i tn a large (iV& S> N r N), and a possibly overcomplete 
set of known basis functions {by} 1 ^^. These models are particularly attractive when there is an inherent 
uncertainty on the transmitters' parameters, such as central frequency and bandwidth of the pulse shapers; 
or, e.g., the roll-off factor when raised-cosine pulses are employed. In particular, adaptive communication 
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schemes rely on frequently adjusting these parameters [12, Ch. 9]. A sizeable collection of bases to 
effectively accommodate most of the possible cases provides the desirable robustness. Still, prior knowledge 
available on the incumbent communication technologies being sensed should be exploited to choose the 
most descriptive classes of basis functions; e.g., a large set of raised-cosine pulses. This knowledge justifies 
why known bases are selected to describe frequency characteristics of the PSD map, while a variational 
approach is preferred to capture spatial dependencies. 

In this context, the envisioned estimation method should provide the CRs with the capability of selecting 
a few bases that "better explain" the actual transmitted signals. As a result, most functions g v are expected 
to be identically zero; hence, there is an inherent form of sparsity present that can be exploited to improve 
estimation. The rationale behind the proposed approach can be rooted in the basis pursuit principle, a term 
coined in [8] for finding the most parsimonious sparse signal expansion using an overcomplete basis set. 
A major differentiating aspect however, is that while the sparse coefficients in the basis expansions treated 
in [8] are scalars, model (1) here entails bases weighted by functions g v . 

The proposed approach to sparsity-aware spline-based field estimation from the space-frequency power 
spectrum measurements ip rn [cf. (3)], is to obtain {g v }u=i as 



Relative to (4), the cost here is augmented with an additional regularization term weighted by a tuning 
parameter \i > 0. Clearly, if fj, = then (12) boils down to (4). To appreciate the role of the new penalty 



values {g^(xi), . . . , ^(x^)} to zero for sufficiently large fi. Interestingly, it will be shown in the ensuing 
section that this is enough to guarantee that g v (~x) = Vx, for /i large enough. 

A. Estimation using the group-Lasso 

Consider the classical problem of linear regression; see, e.g. [13, p. 11], where a vector y £ W 1 of 
observations is available, along with a matrix X G R nxp of inputs. The group Lasso estimate for the vector 
of features C := ■ ■ ■ , Cat ]' G ^ p is defined as the solution to [3], [29] 




N b 

+ ||Mxi), • • -^AX-Nr)]' 



(12) 



term, note that the minimization of ||[g„(xi), . . . , ^(x^)]']^ intuitively shrinks all pointwise functional 




(13) 
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This criterion achieves model selection by retaining relevant factors £„ £ R p /-^ b in which the features 
are grouped. In other words, group-Lasso encourages sparsity at the factor level, either by shrinking to 
zero all variables within a factor, or by retaining them altogether depending on the value of the tuning 
parameter p, > 0. As [i is increased, more sub-vector estimates Q v become zero, and the corresponding 
factors drop out of the model. It can be shown from the Karush-Kuhn-Tucker optimality conditions that 
only for /x > /i max := maxj ||X^y||2 it holds that Ci = ■ ■ ■ = CN b = ®p/N b > so that the values of interest 
are fi £ [0,/x max ] [2]. 

The connection between (13) and the spline-based field estimator (12) builds on Proposition 1, which still 
holds in this context. That is, even though criteria (4) and (12) purposely differ, their respective solutions 
5y(x) have the same form in (5). Indeed, the adaptation of the proof in Appendix A to the new case 
is straightforward, since the additional penalty term in (12) depends on g v evaluated at the knots. The 
essential difference manifested by this penalty is revealed when estimating the parameters a and /3 in (5), 
as presented in the following proposition. 

Proposition 2: The spline-based field estimator (12) is equivalent to group-Lasso (13), under the identities 

l Nb ® {bdiag((N r N XQ' 2 KQ 2 ) 1 / 2 ,0)[KQ 2 T]" 1 } 



;[¥>', 0]', X: 



(14) 



with their respective solutions related by 

<?„(x) = y^p ur K(\\x- x. r \\ 2 ) + a' ul x + a u0 (15) 

r=l 

= bdiag(Q 2 ,I 3 )[KQ 2 T]- 1 ^ (16) 

where (3 U := [f3 vl , ... , f3 uNr ]' and a u := [a u0 , a! vl )'. 

The factors {C u }^=i in (13) are in one-to-one correspondence with the vectors {[(3' u , through 
the linear mapping (16). This implies that whenever a factor C, v is dropped from the linear regression model 
obtained after solving (13), then ^(x) = 0, and the term corresponding to b v {f) does not contribute to (1). 
Hence, by appropriately selecting the value of \x, criterion (12) has the potential of retaining only the most 
significant terms in <3?(x, /) = J2u=i 5^( x )^(/)> an ^ thus yields parsimonious PSD map estimates. All in 
all, the motivation behind the variational problem (12) is now unravelled. The additional penalty term not 
present in (4) renders (12) equivalent to a group-Lasso problem. This enforces sparsity in the parameters 
of the splines expansion for <3?(x, /) at a factor level, which is exactly what is needed to potentially null 
the less descriptive functions g v . 

Remark 3 (Comparison with the PSD map estimator in Section III). The sparsity-agnostic LS problem 
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(4) will not give rise to identically zero vectors {cx u , f3 v }, for any v. Even when Nb is not large, a sparsity- 
aware estimator will perform better if the underlying PSD is generated by a few basis functions. This is 
expected since the out-of-band residual error will increase when all basis functions enter the model (1); see 
also [4] for a related assessment. What is more, when the number of bases is sufficiently large (Nf, ^> N r N) 
matrix B is fat, and the approach in Section III is not applicable . On the other hand, it is admittedly more 
complex computationally to solve (13) than the system of linear equations (6)-(8). Because (12) is not a 
linear smoother, a leave-one-out (bi-) CV approach to select the tuning parameters A and fi does not enjoy 
the computational savings detailed in Appendix D. i^-fold CV can be utilized instead, with typical choices 
of K = 5 or 10, as suggested in [13, p. 242]. 

The group-Lassoed splines-based approach to spectrum cartography developed in this section can be 
summarized in the following steps to estimate the global PSD map <I>(x, /): 

51. Given ip and utilizing any group Lasso solver, obtain £ := . . . , C^v ]' by solving (13). 

52. Form the estimates <3;,/3 using the change of variables [Pl,a' u ]' = bdiag(Q2, Is)[KQ 2 T]" 1 ^ 
for v = 1, . . . , Nb- 

53. Substitute a and /3 into (15) to obtain {^(x)}^^. 

54. Use {(^(x)}^ in (1) to estimate $(x,/). 

Implementing S1-S4 presumes that CRs communicate their local PSD estimates to a fusion center, which 
uses their aggregation in ip to estimate the field. But what if an FC is not available for centrally running 
S1-S4? In certain cases, forgoing with an FC is reasonable when the designer wishes to avoid an isolated 
point of failure, or, aims at a network topology which scales well with an increasing number of CRs based 
on power considerations (CRs located far away from the FC will drain their batteries more to reach the 
FC). These reasons motivate well a fully distributed counterpart of S1-S4, which is pursued next. 

V. Distributed Group-Lasso for In-Network Spectrum Cartography 

Consider N r networked CRs that are capable of sensing the ambient RF spectrum, performing some 
local computations, as well as exchanging messages among neighbors via dedicated control channels. In 
lieu of a fusion center, the CR network is naturally modeled as an undirected graph Q(1Z,£), where the 
vertex set TZ := {1, . . . , N r } corresponds to the sensing radios, and the edges in £ represent pairs of CRs 
that can communicate. Radio r £ 1Z communicates with its single-hop neighbors in M r , and the size of 
the neighborhood is denoted by \Af r \. The locations {x^^j := X of the sensing radios are assumed 
known to the CR network. To ensure that the measured data from an arbitrary CR can eventually percolate 
throughout the entire network, it is assumed that the graph Q is connected; i.e., there exists a (possibly) 
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multi-hop communication path connecting any two CRs. 

For the purpose of estimating an unknown vector C := . . . , £' N ]' £ M. p , each radio r £ K has 
available a local vector of observations y r £ M" r as well as its own matrix of inputs X r £ M nrXp . Radios 



collaborate to form the wanted group-Lasso estimator (13) in a distributed fashion, using 

JV r N b 



glasso 



It 



x r c| 



(i 



are mm 
C 2 

r=l i/=l 



ElIC 



v 2 



(17) 



where y := [y^, . . . , y^J £ 



nxl with n := ^r=i n r, and X := [X' 1? . . . , X^J £ M. nxp . The motivation 
behind developing a distributed solver of (17) is to tackle (12) based on in-network processing of the local 
observations ip r := [ip r \, . . . , ^n]' available per radio [cf. (3)]. Indeed, it readily follows that (17) can be 
used instead of (13) in Proposition 2 when 



1 







x r 



1 



B 



J N r ,r 



r £ K 



I Nb [bdiag((Af r AfAQ / 2 KQ 2 ) 1 /2 )0 )[KQ 2 T]" 1 ] 
corresponding to the identifications n r = N Vr £ 1Z, p = Nj,N r . Note that because the locations {x r } are 
assumed known to the entire network, CR r can form matrices T, K, and thus, the local regression matrix 



A. Consensus-based reformulation of the group-Lasso 

To distribute the cost in (17), replace the global variable £ which couples the per-agent summands 
with local variables {C r }^=i representing candidate estimates of £ per sensing radio. It is now possible to 
reformulate (17) as a convex constrained minimization problem 



are mm 

B {CA 2 



IE 



r=l 



ru\\2 



(18) 



S. t. Cr = Cr', r £ ft, r' £ ftf r , Cr := [Crl, • • • , CrArj'- 

The equality constraints directly effect local agreement across each CR's neighborhood. Since the commu- 
nication graph Q is assumed connected, these constraints also ensure global consensus a fortiori, meaning 
that Cr = Cr' ; Vr, r' £ 1Z. Indeed, let P(r, r') : r, n, T2, . ■ . , r n , r' denote a path on Q that joins an arbitrary 
pair of CRs (r, r'). Because contiguous radios in the path are neighbors by definition, the corresponding 
chain of equalities Cr = d = Cr 2 = ■ ■ ■ = Cr„ = Cr' dictated by the constraints in (18) imply Cr = Crs 
as desired. Thus, the constraints can be eliminated by replacing all the {Cr} with a common C, in which 
case the cost in (18) reduces to the one in (17). This argument establishes the following result. 

Lemma 1: If Q is a connected graph, (17) and (18) are equivalent optimization problems, in the sense 
that Cgiasso = Cr, Vr £ TZ. 
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Problem (18) will be modified further for the purpose of reducing the computational complexity of the 

resulting algorithm. To this end, for a given ael p consider the problem 

1 N b 

nun-HCHl " a'C + ^ ||C,|| 2 , C: C' C'nJ (19) 

and notice that it is separable in the iV& subproblems 

min-IICHl -atC + HICIh, a : = Wi, • • • , a 7vJ'- ( 20 ) 

Interestingly, each of these subproblems admits a closed-form solution as given in the following lemma. 

Lemma 2: The minimizer of (20) is obtained via the vector soft-thresholding operator Ta(-) defined 
by 

C = %M := (Wl* " ^)+ ( 21) 

ll a ^l|2 

where (•), := max{-,0} . 

Problem (19) is an instance of the group-Lasso (13) when X'X = I p , and a := X'y- As such, result 
(21) can be viewed as a particular case of the operators in [22] and [28]. However it is worth to prove 
Lemma 2 directly, since in this case the special form of (20) renders the proof neat in its simplicity. 

Proof: It will be argued that the solver of (20) takes the form = ta. u for some scalar t > 0. This is 
because among all C, v with the same ^-norm, the Cauchy-Schwarz inequality implies that the maximizer 
of a'jXi/ is colinear with (and in the same direction of) a. u . Substituting £ u = ta. u into (20) renders the 
problem scalar in t > 0, with solution t* = (||a^|| — fj,), / (2||aj,||), which completes the proof. ■ 

In order to take advantage of Lemma 2, auxiliary variables -y r , r = 1, . . . , N r are introduced as copies 
of £ r . Upon introducing appropriate constraints 7 r = £ r that guarantee the equivalence of the formulations 
along the lines of Lemma 1, problem (18) can be recast as 



1^ 

are mm — > 



N,, 



(22) 



_ -v" ii2 , ftf_ \ ^ ii j. 
r A -rlr\\2 ' »t / 4 \\Srv\\ 

iVr u=l 

s. to Cr = It = Cr', r eK, r' e M r 
7r = Cr, r £ 1Z. 

The dummy variables 7^ are inserted for technical reasons that will become apparent in the ensuing 
section, and will be eventually eliminated. 

B. Distributed group-Lasso algorithm 

The distributed group-Lasso algorithm is constructed by optimizing (22) using the alternating direction 
method of multipliers (AD-MoM) [6]. In this direction, associate Lagrange multipliers v r ,v£ and v£ with 
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the constraints 7 r = £ r , £ r > = 7^' and £ r = 7^', respectively, and consider the augmented Lagrangian with 
parameter c > 



where for notational convenience we group the variables 7 := {7 r , {7^ } r 'eA/ r r }re'R.> an ^ multipliers 



Application of the AD-MoM to the problem at hand consists of a cycle of C c minimizations in a block- 
coordinate fashion w.r.t. {£ r } firstly, and 7 secondly, together with an update of the multipliers per iteration 
k = 0,1,2,.... Deferring the details to Appendix E, the four main properties of this procedure that are 
instrumental to the resulting algorithm can be highlighted as follows. 

i) Thanks to the introduction of the local copies £ r and the dummy variables *y£ , the minimizations 
of C c w.r.t. both {£ r } and 7 decouple per CR r, thus enabling distribution of the algorithm. 
Moreover, the constraints in (22) involve variables of neighboring CRs only, which allows the required 
communications to be local within each CR's neighborhood. 

ii) Introduction of the variables 7,. separates the quadratic cost ||y r — X r 7 r ||2 from the group-Lasso 
penalty Ylv=i \\Cru\\2- As a result, minimization of (23) w.r.t. £ r takes the form of (19), which admits 
a closed-form solution via the vector soft-thresholding operator 7^(-) in Lemma 2. 

iii) Minimization of (23) w.r.t. 7 consists of an unconstrained quadratic problem, which can also be solved 
in closed form. In particular, the optimal 7^ at iteration k takes the value 7^ (k) = (Cr{k) + Cr' (k)) /2, 
and thus can be eliminated. 

iv) It turns out that it is not necessary to cany out updates of the Lagrange multipliers {%'. , v£'} r < e _/v; 
separately, but only of their sums which are henceforth denoted by p r := Ylr'eM v^r + ■)• Hence, 
there is one price p r per CR r = 1, . . . , N r , which can be updated locally. 

Building on these four features, it is established in Appendix E that the proposed AD-MoM scheme 
boils down to four parallel recursions run locally per CR: 




(23) 



v : = 



{v r , {Vr'} r 'eAr r , {Vr'}r'eN~r}r&l- 




(24) 



V r (fc) = V r (k - 1) + c[Cr(*0 - 7r(*0] 



(25) 
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Algorithm 1 : DGLasso 

All radios r € 7Z initialize {£,.(0), 7 r (0), p r (— 1), v r (— 1)} to zero, and locally run: 

for k = 0, 1,. . . do 

Transmit Cr{k) to neighbors in Af r - 

Update p r (k) via p r (fc) = p r (k - 1) + cJ2 r 'eJ^ r l^(k) - Cr'(k)]. 
Update v r (fc) via v r (fc) = v r (fc — 1) + c[£ r (fc) — -f r (k)]. 
Update Cr{k + 1) using (26). 
Update 7 r (fc + 1) using (27). 
end for 



^ + 1 ) = dV P (2|M-| + l) ' l/ = 1 '"" Arb (26) 

7r (A: + 1) = [d, + X;,X r ] - 1 (X;,y r + cCr(k + 1) + v r (fc)) . (27) 

Recursions (24)-(27) comprise the novel DGLasso algorithm, tabulated as Algorithm 1. 

The algorithm entails the following steps. During iteration k + 1, CR r receives the local estimates 
{Cr'(k)}r'eAf r from the neighboring CRs and plugs them into (24) to evaluate the dual price vector p r (fe). 
The new multiplier v r (k) is then obtained using the locally available vectors {"f r (k), Cr{k)}. Subsequently, 
vectors {p r (k), v r (k)} are jointly used along with {Cr'{k)} r >eAf r to obtain £ r (k + l) via A^;, parallel vector 
soft-thresholding operations 7^(-) as in (21). Finally, the updated ~f r (k + 1) is obtained from (27), and 
requires the previously updated quantities along with the vector of local observations y r and regression 
matrix X r . The (k + l)st iteration is concluded after CR r broadcasts £ r (£; + 1) to its neighbors. Even if 
an arbitrary initialization is allowed, the sparse nature of the estimator sought suggests the all-zero vectors 
as a natural choice. Three additional remarks are now in order. 

Remark 4 (Distributed Lasso algorithm as a special case). When 2V& = p and there are as many groups 
as entries of C> then the sum ^2^=1 \\Cv\\ becomes the £i-norm of £, and group-Lasso reduces to Lasso. 
In this case, DGLasso offers a distributed algorithm to solve Lasso that coincides with the one in [5]. 
Remark 5 (Centralized Group-Lasso algorithm as a special case). For N r = 1, the network consists 
of a single CR. In this case, DGLasso yields a novel algorithm for the centralized group-Lasso estimator 
(17), which is specified as Algorithm 2. Notice that the thresholding operator 7^ in GLasso sets the entire 
sub-vector £ u (k + 1) to zero whenever \\cy u (k) — v u (k)\\2 does not exceed fj,, in par with the group- 
sparsifying property of group-Lasso. Different from [29], GLasso can handle a general (not orthonormal) 
regression matrix X. Compared to the block-coordinate algorithm of [22], GLasso does not require an inner 
Newton-Raphson recursion per iteration. If in addition A^ = p, then GLasso yields the Lasso estimator. 
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Algorithm 2 : GLasso 

Initialize {C(0), 7(0), v(— 1)} to zero, and run: 

for k = 0, 1,. . . do 

Update v(fc) = v(k - 1) + c[C(fc) - j(k)]. 

Update Cu{k + 1) = {l/c)% (cr/ v (k) - v„(*)) , u=l,...,N b . 

Update j(k + 1) = [cl p + X'Xf 1 (X'y + <(fc + 1) + v(fc)). 
end for 



Remark 6 (Computational load balancing). Update (27) involves inversion of the pxp matrix cIp+X'.X r , 
that may be computationally demanding for sufficiently large p. Fortunately, this operation can be carried out 
offline before running the algorithm. More importantly, the matrix inversion lemma can be invoked to obtain 
[dp + Xj.Xj.J~ 1 = c _1 I p — ~K' r (cl Jlr + X r X' r ) _1 X r . In this new form, the dimensionality of the matrix 
to invert becomes n r x n r , where n r is the number of locally acquired data. For highly underdetermined 
cases (n r <C p), (D)GLasso enjoys considerable computational savings through the aforementioned matrix 
inversion identity. One also recognizes that the distributed operation parallelizes the numerical computation 
across CRs: if GLasso is run at a central unit with all network- wide data available centrally, then the 
matrix to invert has dimension n = ^2 re -ji n r> which increases linearly with the network size N r . Beyond 
a networked scenario, DGLasso provides an attractive alternative for computational load balancing in 
contemporary multi-processor architectures. 

To close this section, it is useful to mention that convergence of Algorithm 1, and thus of Algorithm 2 
as well, is ensured by the convergence of the AD-MoM [6]. This result is formally stated next. 

Proposition 3: Let Q be a connected graph, and consider recursions (24)-(27) that comprise the DGLasso 
algorithm. Then, for any value of the step-size c > 0, the iterates Cr(k) converge to the group-Lasso 
solution [cf. (11)] as k — > oo, i.e., 

lim Cr(k) = Cgiasso, Vr G TZ. (28) 

k— >oo 

In words, all local estimates Cr{k) achieve consensus asymptotically, converging to a common vector that 
coincides with the desired estimator Cgiasso- Formally, if the number of parameters p exceeds the number 
of data n, then a unique solution of (13) is not guaranteed for a general design matrix X. Proposition 3 
remains valid however, if the right-hand side of (28) is replaced by the set of minima; that is, 

lim Cr(k) G arg min — V ||y P - X r C||2 +A t X] \\Ch- 

r=l v=\ 
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VI. Numerical tests 

Consider a set of N r = 100 CRs uniformly distributed in an area of lKm 2 , cooperating to estimate the 
PSD map generated by N s = 5 licensed users (sources) located as in Fig. 2 (left). The five transmitted 
signals are raised cosine pulses with roll-off factors p £ {0,1}, and bandwidths W £ {10,20,30} 
MHz. They share the frequency band B = [100, 260] MHz with spectra centered at frequencies f c = 
105, 140, 185, 215, and 240 MHz, respectively. Fig. 2 (right) depicts the PSD generated by the active 
transmitters. 

The PSD generated by source s experiences fading and shadowing effects in its propagation from x s 
to any location x, where it can be measured in the presence of noise. A 6-tap Rayleigh model is adopted 
for the multipath channel H s (f, r, x) between x s and x, whose expected gain adheres to the path-loss law 
E(\H S \ 2 ) = exp(— ||x s — xHl/A 2 ), with A = 0.8. A deterministic shadowing effect is generated by a 
18m-high and 500m-wide wall represented by the black segment in Fig. 2 (left). It produces a knife-edge 
effect on the power emitted by the antennas at a height of 20m. The simulated tests presented here account 
for the shadowing at ground level. 

A. Spectrum cartography 

When designing the basis functions in (1), it is known a priori that the transmitted signals are indeed 
normalized raised cosine pulses with roll-off factors p £ {0, 1}, and bandwidths W £ {10,20,30} MHz. 
However, the actual combination of bandwidths and roll-off factors used can be unknown, which justifies 
why an overcomplete set of bases becomes handy. Transmitted signals with bandwidth W = 10 MHz are 
searched over a grid of 16 evenly spaced center frequencies f c in B. Likewise, for W = 20 and 30 MHz, 
15 and 14 center frequencies are considered, respectively. This amounts to 2 x (16 + 15 + 14) = 90 possible 
combinations for p, W, and f c , thus Nb = 90 bases are adopted. 

Each CR computes periodogram samples rn (r) at N = 64 frequencies with SNR = —5 dB, and 
averages them across T = 100 time-slots to form ip rn , n = 1, . . . , 64 as in (3). These network-wide 
observations at T = 100 are collected in ip, and following steps S1-S4 at the end of Section IV, the spline- 
based estimator (12), and thus the PSD map <5(x, /) is formed. This map is summed across frequencies, 
and the result is shown in Fig. 3 (left) which depicts the positions of transmitting CRs, as well as the 
radially-decaying spectra of four of them (those not affected by the obstacle). It also identifies the effect 
of the wall by "flattening" the spectrum emitted by the fifth source at the top-left corner. Inspection of the 
estimate <3?(x, /) across frequency confirms that group-Lasso succeeds in selecting the candidate bases. Fig. 
4 (left) shows points representing ||Ct>||2> v = 1> ■ ■ ■ ■> Nb, where C, v is the sub- vector in the solution of the 
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group-Lasso estimator (13) associated with g v {~x) and b u (f). They peak at indexes u = 1, 28, 46, 51, and 
70 (circled in red), which correspond to the "ground-truth" model, since bases b\, 628 > &46) &5i> an d &70 
match the spectra of the transmitted signals. Even though approximately 75% of the variables drop out of 
the model, some spurious coefficients are retained and their norms are markedly smaller than those of the 
"ground-truth" bases. This is expected because based on finite samples there is no guarantee that group- 
Lasso will recover the exact support, in general. Nevertheless, the effectiveness of group-Lasso in revealing 
the transmitted bases is apparent when compared to other regularization alternatives. Fig. 4 (right) depicts 
the counterpart of Fig. 4 (left) when using a sparsity-agnostic ridge regression scheme instead of (13). In 
this case, no basis selection takes place, and the spurious factors are magnified up to a level comparable to 
three of the "true" basis function b u (f). To the best of our knowledge, no other basis selection methods in 
the literature are applicable to the nonparametric model (1) considered here. In particular, COSSO in [16] 
is not applicable since it does not provide a basis selection method and relies on orthogonality assumptions. 

In summary, this test case demonstrates that the spline-based estimator can reveal which frequency bands 
are (un)occupied at each point in space, thus allowing for spatial reuse of the idle bands. For instance, 
transmitter TX5 at the top-right corner is associated with the basis function b^{f), the only one of the 
transmitted five that occupies the 230 — 260 MHz sub-band. Therefore, this sub-band can be reused at 
locations x away from the transmission range of TX5, which is revealed in Fig. 3 (left). 

The group-Lasso estimator in SI was obtained via the GLasso algorithm developed in Section V (cf. 
Algorithm 2). The GLasso output at iteration k = 1,000 is compared to previous iterates in Fig. 3 
(right), which demonstrates the monotone decay of their difference, and thus corroborates convergence to a 
limit point. Then, it is verified numerically that £(1000) satisfies the necessary and sufficient conditions for 
optimality of (17), as given in [29]. These two tests together provide numerical confirmation of Proposition 
3 on the convergence of GLasso, and the optimality of the limit point. 

B. Tuning parameters via cross-validation 

Results in Figs. 3 (left) and 4 depend on the judicious selection of parameters A and \i in (12). Parameter 
A affects smoothness, which translates to congruence among PSD samples, allowing the CRs to recover 
the radial aspect of the transmit-power. Parameter fi controls the sparsity in the solution, which dictates 
the number of bases, and thus transmission schemes that the estimator considers active. 

To select A and \i jointly so that both smoothness and sparsity are properly accounted for, one could 
consider a two-dimensional grid of candidate pairs, and minimize the CV error over this grid. However, this 
is computationally demanding, especially because the nondifferentiable cost in (13) renders the shortcuts in 
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Appendix D not applicable (see also Remark 3). A three-step alternative is followed here. First, estimator 
(12) is obtained using an arbitrarily small value of A = 1 x 10~ 6 , and selecting fi = 0.1/i max , where /i max 
is given in subsection IV-A. In the second step, only the surviving bases are kept, and the sparsifying 
penalty is no longer considered, thus reducing the estimator to that of Section III. If the reduced matrix 

B, built from the surviving bases, is full rank (otherwise repeat the first step with a larger value of ii), 
the procedure in Appendix D is followed to adjust the value of A via leave-one-out CV. The result of this 
step is illustrated in Fig. 5 (left), where the minimizer \ cv = 7.9433 x 10~ 6 of the OCV cost is selected. 
The final step consists of reconsidering the sparsity enforcing penalty in (12), and selecting fi using 5-fold 
CV. The minimizer of the CV error ficv = 0.0078// max corresponding to this step is depicted in Fig. 5 
(right). Using the Xcv an d fJ-cv so obtained, the PSD map plotted in Fig. 3 (left) was constructed. The 
rationale behind this approach is that it corresponds to a single step of a coordinate descent algorithm for 
minimizing the CV error CV(X, fi). Function CV(X, p) is typically unimodal, with much higher sensitivity 
on fj, than on A, a geometric feature leading the first coordinate descent update to be close to the optimum. 

The importance of an appropriate fi value becomes evident when inspecting how many bases are retained 
by the estimator as \i decreases from // max to 1 x 10~ 4 / u max . The AT& lines in Fig. 6 (left) link points 
representing HC^O")]^ as ft takes on 20 evenly spaced values on a logarithmic scale, comprising the so- 
termed group-Lasso path of solutions. When p = p ma , x is selected, by definition the estimator forces all 
C v to zero, thus discarding all bases. As p tends to zero all bases become relevant and eventually enter 
the model, which confirms the premise that LS estimators suffer from overfitting when the underlying 
model is overcomplete. The cross-validated value pcv is indicated with a dashed vertical line that crosses 
the path of solutions at the values of HC^Ib- At this point, five sub- vectors corresponding to the factors 
v = 1, 28, 46, 51, and 70 are considerably far away from zero hence showing strong effects, in par 
with the results depicted in Fig. 4 (left). Certainly interesting would be to corroborate the effectiveness 
of the proposed PSD map estimator on real data comprising spatially distributed RF measurements. Upon 
availability of such dataset, this direction will be pursued and reported elsewhere. 

C. Example with real data 

The goal of this section is to demonstrate that the GLasso algorithm in Section V can be useful for 
applications other than the spline-based BEM for spectrum cartography dealt with in Sections III and 
IV. This demonstration will rely on the birthweight dataset from [14], considered also by the seminal 
group-Lasso work of [29]. The objective is to predict the human birthweight from p = 8 factors including 
the mother's age, weight, race, smoke habits, number of previous premature labors, history of 
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hypertension, uterine irritability, and number of physician visits during the first trimester 
of pregnancy. Third-order polynomials were considered to model nonlinear effects of the age and weight 
on the response, augmenting the model size to p = 12 by grouping the polynomial coefficients in two subsets 
of three variables. 

GLasso is run under this setup, over the set of N = 189 samples, with \i selected via 7-fold CV. Fig. 6 
(right) depicts the evolution of the factors' strength measured by ||C/|||> which - as expected - converge 
to the same prediction model as in [29]. Additionally, GLasso is capable of determining that the eighth 
factor (visits) is not significant even from the first iterations, allowing for early model selection. 

VII. Concluding Summary 

A basis expansion approach was introduced in this paper to estimate a multi-dimensional field, whose 
dependence on a subset of its variables is modeled through preselected (and generally overlapping) basis 
functions weighted by unknown coefficient-functions of the remaining variables. The unknown coefficient 
functions can be estimated from the field's noisy samples, by solving a variational LS problem which admits 
infinite solutions. Without extra constraints, the estimated field interpolates perfectly the data samples, at the 
price of severely overfitting the true field elsewhere. The first contribution was to regularize this variational 
LS cost by a smoothing term, which can afford a unique finite -parameter spline-based solution. The latter 
is expressed in terms of radial kernels and polynomials whose parameters were estimated in closed form. 
A recursive PSD tracker was also developed for slowly time-varying spectra. 

The second main contribution pertains to a robust variant of the function estimator, when an overcomplete 
set of bases is adopted to effectively accommodate model uncertainties. The novel estimator here minimizes 
the variational LS cost regularized by a term that performs basis selection, and thus yields a parsimonious 
description of the field by retaining those few members of the basis that "better" explain the data. This 
attribute is achieved because the added penalty induces a group (G)Lasso estimator on the parameters of 
the kernels and polynomials. Even though the number of unknowns increases with overcomplete bases, 
most coefficients are zero, meaning that the complexity remains at an affordable level using the sparsity- 
promoting GLasso. Notwithstanding, (group-) Lasso here is introduced to effect (group-) sparsity in the 
space of smooth functions. 

The third contribution is a provably convergent GLasso estimator developed based on AD-MoM iterations. 
It entails parallel closed-form updates, which involve simple vector soft-thresholding operations per factor. 
Its fully-distributed counterpart was also developed for use by a network of wireless sensors, or, multiple 
processors to balance the load of a computational cluster. It is worth stressing that both GLassso and 
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DGLasso are standalone tools for sparse linear regression, applicable to a gamut of problems that go 
beyond the field estimation context of this paper. 

The fourth contribution is in the context of wireless CR network sensing (our overarching practical 
motivation), where the overcomplete estimated field enables cartographing the space-frequency distribution 
of power generated by RF sources whose transmit-PSDs are shaped by, e.g., raised-cosine pulses with 
possibly different roll-off factors, center frequencies, and bandwidths. Using periodogram samples collected 
by spatially distributed CRs, the sparsity-aware spline-based estimator yields an atlas of PSD maps (one 
map per frequency). As corroborated by simulations, the atlas enables localizing the sources and discerning 
their transmission parameters, even in the presence of frequency-selective Rayleigh fading and pronounced 
shadowing effects due to e.g., an obstructing wall. Simulated tests also illustrated the convergence of Glasso, 
and confirmed that the sparsity-promoting regularization is effective in selecting those basis functions that 
strongly influence the field, when the tuning parameters are cross-validated properly. 

Given the existing connections between splines and classical estimators for both random and deterministic 
field models, the spline-based methods developed in this paper provide a unifying framework suitable for 
both paradigms. The model and the resultant (parsimonious) estimates can thus be used in more general 
statistical inference and localization problems, whenever the data admit a basis expansion over a proper 
subset of its dimensions. Furthermore, results in this paper extend to kernels other than radial basis functions, 
whenever the smoothing penalty is replaced by a norm induced from an RKHS. Also of interest is to quantify 
the number of data required to attain a prescribed approximation error, in light of the existing connections 
between spline-based reconstruction and Shannon's sampling theory [25]. 
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Appendix 



A. Proof of Proposition 1 : Rewrite (4) as 



mm 



mmVW^-^x,)^/,,) +X ||V 2 5l (x)||^x +AV / \\V 2 g v (x)\\ 2 F dx 

(29) 

with (pin, 1 ^ := ifm — Y^v=2 5 f i^( x r)^(/n)- Focusing on the inner minimization w.r.t. g\, fix the set of 
functions {g u }v=2> an d note that only the first two terms are relevant (those within the square brackets). 
It follows from [10, Theorem 4 bis] that g\ takes the form in (5), with coefficients f3i,ot\\ and a\o that 
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depend on G^" 1 ' := {g u (x r ), r = 1, . . . , N r , v = 2, . . . , iV&} through <frn- The next step is to minimize 
(29) w.r.t. $2 but with {g u }„=3 fixed, which amounts to 

AT r JV 2 r r 

minVVte 2 )-5i(x06i(/ n )- 5 2(x r .)6 2 (/ n )) +A/ ||V 2 pi(x)|||dx + A / ||V 2 52 (x)|||dx 

(30) 

where i^n^ := — ^2u=3 9^( x r)^(/n)- In the first two summands of the cost in (30), g\ depends on 
52 via G*" 1 ). Because G^~ x > only involves evaluating gi on Af, [10, Theorem 4 bis] can be applied again, 
and the optimal solution §2 takes the from (5). The same argument carries over to subsequent minimization 
steps for v = 3, . . . , iV&, establishing that all {<7„(x)} are thin-plate splines as in (5). 
B. Proof of (6)-(8): Upon substituting (5) into (4), it will shown next that the optimal coefficients {a, (3} 
specifying {gv(^)}u=i are obtained as solutions to the following constrained, regularized LS problem 

mm -J— \\<p - (B ® K)0 - (B ® T)a|| 2 + Xf3'(I Nb ® K)/3 

ct,p iv r iV 

s. t. (I^»T')/3 = 3 jv 6 . (31) 

Observe first that the constraints (3 V G £? in Proposition 1 can be expressed as T'/3„ = O3 for each 
1/ = 1, . . . , iVj,, or jointly as (Ijv b ® T')/3 = 03^. For the optimization objective in (31), note from (5) that 
g u (xr) = + t' r a u , where k', and t'. are the rth rows of K and T, respectively. The first term in the 
cost of (4) can be expressed (up to a factor (iVj-iV) -1 ) as 

JV N r ( N b \ 2 N N r 

n=l r=l \ u=l / n=l r=l 

N 



J2\\<Pn-K®K)P-(K® T )<x\ 

n=l 

\\tp- (B®K)/3- (B®T)a|| 2 . 



Consider next the penalty term in the cost of (4). Substituting into (5), it follows that J R2 ||V 2 5 J y(x)|| 2 7 o?x = 
(3' U K(3 U [26, p. 33]. It thus holds that 

JV„ Aft, 

A E / J|V 2 &(x)||£dx = A^T/^KA, = A/3'(Iat 6 K)/3 

v=i ^ R2 ^=1 

from which (31) follows readily. 

Now that the equivalence between (4) and (31) has been established, the latter must be solved for a and 
(3. Even though K (hence Ijy 6 (g)K) is not positive definite, it is still possible to show that /3'(Ijv 6 (8)K)/3 > 
for any (3 such that (Ijv t ® T')/3 = 03jv b [10, p. 85], implying that (31) is convex. Proceeding along the 
lines of [26, p. 33], note first that the constraint (Ijy b ® T')/3 = 0sN b implies the existence of a vector 
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7 G M. Nb ( Nr ~ 3 ' satisfying (8). After this change of variables, (31) is transformed into an unconstrained 
quadratic program, which can be solved in closed form for {0,7}. Hence, setting both gradients w.r.t. a 
and 7} to zero yields (6) and (7). 

C. Proof of Proposition 2: After substituting (15) into (12), one finds the optimal {a, (3} specifying 
{gu(^)}u=i i n (15), as solutions to the following constrained, regularized LS problem 

1 N b 

mi £l^ll^- (B®K)/3- (B®T)a\\ 2 2 + \[3'(l Nb ® K)/3 + ||K&, + Ta v || 2 

v=\ 

s. t. (I Nb ®T')(3 = 3Nb . (32) 

With reference to (32), consider grouping and reordering the variables {a, (3} in the vector u := 
[ui, . . . ,u' N ]', where := [f3' v a' v ]'. As argued in Section III-A, the constraints Tj3 v = can be 
eliminated through the change of variables = bdiag(Q2, l3)v„ for v = 1,...,JV&; or compactly as 
u = (Ijv,, ® bdiag(Q2, Is))v. The next step is to express the three summands in the cost of (32) in terms 
of the new vector optimization variable v. Noting that \s! r (3 u + t' r ct v = [k.' r t' r ]w v , and mimicking the steps 
in Appendix A, the first summand is 



1 



N r N 



\<p-(B®K)/3 



(B®T)a\ 



1 



N r N 
1 

N r N 



\ip- (B® [K T])u 



|<^-(B®[KQ 2 T])v\\l 



(33) 



The second summand due to the thin-plate penalty can be expressed as 

N h N b N b 

A P'uKpu = A <bdiag(K, 0)u„ = XJ2 v>diag(Q 2 KQ 2 , 0)v„ 



v=l 



while the last term is ||K/3„ + Ta v \\ 2 = /^Et=ill[ K TKIb = ^Et=ill[ K Q 2 T]v„|| 2 

Combining (33) with (34) by completing the squares, problem (32) is equivalent to 

r 1 r 1 2 

1 



Av'(I 7V6 0bdiag(Q / 2 KQ 2 ,O))v 



(34) 



mm 



N r N 



+ A*5^||[KQ 2 TK|| 2 (35) 



<p B ® [KQ 2 T] 

J [ I Nb ® bdiag((iV r iVAQ 2 KQ 2 ) 1 /2 ) ) 

and becomes (13) under the identities (14), and after the change of variables £ := . . . , C'nJ' = (^N b ® 
[KQ 2 T])v. By definition of u, v, and £, the original variables can be recovered through the transformation 
in (16). 

D. Selection of the smoothing parameter in (4): The method to be developed builds on the so-termed leave- 
one-out CV, which proceeds as follows; see e.g., [26, Ch. 4]. Consider removing a single data point ip rn 
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from the collection of N r N measurements available to the sensing radios. For a given A, let i>\~ rr ^(x, /) 
denote the leave-one-out estimated PSD map, obtained by solving (4) following steps S1-S3 in Section 
III-A, using the N r N — 1 remaining data points. The aforementioned estimation procedure is repeated N r N 
times by leaving out each of the data points p rn , r = 1, . . . , N r and n = 1, . . . , N, one at a time. The 
leave-one-out or ordinary CV (OCV) [13, p. 242], [26, p. 47], for the problem at hand is given by 

OCV(A) = ^EE fa™ ~ ^(.Xr, fn)) (36) 
r=l n=l 

while the optimum A is selected as the minimizer of OCV(A), over a grid of values A G [0, A max ]. Function 
(36) constitutes an average of the squared prediction errors over all data points; hence, its minimization 
offers a natural criterion. The method is quite computationally demanding though, since the system of 
linear equations (6)-(8) has to be solved N r N times for each value of A on the grid. Fortunately, this 
computational burden can be significantly reduced for the spline-based PSD map estimator considered 
here. 

Recall the vector ip collecting all data points measured at locations X and frequencies F. Define next a 
similar vector ip containing the respective predicted values at the given locations and frequencies, which is 
obtained after solving (4) with all the data in ip and a given value of A. The following lemma establishes 
that the PSD map estimator is a linear smoother, which means that the predicted values are linearly related 
to the measurements, i.e., ip = S\ip for a A-dependent matrix Sa to be determined. Common examples of 
linear smoothers are ridge regressors and smoothing splines; further details are in [13, p. 153]. For linear 
smoothers, by virtue of the leave-one-out lemma [26, p. 50] it is possible to rewrite (36) as 

where <I>a( x ! /) stands for the estimated PSD map when all data in <p are utilized in (4). The beauty of the 
leave-one-out lemma stems from the fact that given A and the main diagonal of matrix Sa, the right-hand 
side of (37) indicates that fitting a single model (rather than N r N of them) suffices to evaluate OCV(A). 
The promised lemma stated next specifies the value of Sa necessary to evaluate (37). 

Lemma 3: The PSD map estimator in (4) is a linear smoother, with smoothing matrix given by 
S A = (B ® {KQ 2 - TR- 1 Q' 1 KQ 2 })[(B'B ® Q' 2 KQ 2 ) + A^iVAI]- 1 ^' ® Q' 2 ) 

+ (BI 1 - 1 ^ 1 gTR^Qi). (38) 
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Proof: Reproduce the structure of ip in Section III-A to form the supervector ip := [ip^, . . . , <p' N ]' £ 
M. NrN , by stacking each vector <p n := [$^(xi, f n ), . . . , &\{x.N r , f n )}' corresponding to the spatial PSD 
predictions at frequency f n £ F. From (5), it follows that <&\(-x r , f n ) = (b n (g) k r )'/3 — (h n ®t r )'a, where 
b' n , kj. and t' r are the nth and rth rows of B, K and T, respectively. By stacking the PSD map estimates, 
it follows that <p n = (h' n & K)(3 - (b' n ® T)a, which readily yields 

<p = (B (g) K)$ - (B T)d. (39) 

Because the estimates {a,0} are linearly related to the measurements ip [cf. (6)-(8)], so is cp as per (39), 
establishing that the PSD map estimator in (4) is indeed a linear smoother. Next, solve explicitly for {a, 0} 
in (6)-(8) and substitute the results in (39), to unveil the structure of the smoothing matrix such that 
(p~ = S\ip. Simple algebraic manipulations lead to the expression (38). ■ 

The effectiveness of the leave-one-out CV approach is corroborated via simulations in Section VI. 
E. Proof of (24)-(27): Recall the augmented Lagrangian function in (23), and let £ := {Cr}reTl for 
notational brevity. When used to solve (22), the three steps in the AD-MoM are given by: 

[SI] Local estimate updates: 

C(k + 1) = arg mm£ c [C,7(*0, v(k)] . (40) 

[S2] Auxiliary variable updates: 

7(fc + l) = argmin/: c [C(fe + l),7,«(fe)]- (41) 

7 

[S3] Multiplier updates: 

v r {k + 1) = v r (fc) + c[C(k + 1) - ~/r(k + 1)] (42) 
V r '(k + l)=V r '(k) + c[Cr(k + l)-'y r r(k + l)} (43) 

+ 1) = <(fc) + c[Cr-(k + 1) - Yr\k + 1)]. (44) 

Focusing first on [S2], observe that (23) is separable across the collection of variables {7j} and {7^ } 
that comprise 7. The minimization w.r.t. the latter group reduces to 

l r J{k + 1) = argminc||7;'|| 2 - c(c r {k + 1) + Cv{k + l)^' - (v r r ' (k) + V r ' (kfjrf' 
= l(<r(k + 1) + CAk + 1)) + ^ (<(*:) + <(*)) 

= i(Cr(fc + + (*! + !))■ (45) 
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The result in (45) assumes that v£'(fc) + v^'(fe) = 0, Vfe. A simple inductive argument over (43), (44) 
and (45) shows that this is indeed true if the multipliers are initialized such that v£ (0) + v£ (0) = 0. 

The remaining minimization in (41) with respect to {~y r } decouples into N r quadratic sub-problems [cf. 
(23)], that is 

1 c 
lr {k + 1) = arg min - ||y r - X r 7 r L - v' r (k)j r + -||CrO + 1) - Trill 

It I I 

which admit the closed-form solutions in (27). 

In order to obtain the update (24) for the prices p r , consider their definition together with (43), (44) and 
(45) to obtain 



'€Jv" r 



p r (k + l)= £ (V r '(k + l) + v r r ,(k + l) 



(<(*:) +vP(fc))+ c{2C r (k + l)-Yr'(k)-YAk) 



'eJV r r'eN,- 

= p r (k) + C £ (Cr(k + 1) " Cr'(k + 1)) 

which coincides with (24). 

Towards obtaining the updates for the local variables in £, the optimization (40) in [SI] can be also split 
into N r sub-problems, namely 



Cr 



C r (fc + l) = argmin I JL £ ||C„,|| 2 + v' r (k)Cr + f||Cr - Tr(^)||| + £ [< (fc) + (k) 
Cr I r t/=l r'e/Vr 

+ | E [llCr-7r(*)llI+llCr-7M*)lll] j 
= arg min < ^ ||Cri>||2 - I c ^ (CO) + Cr'(fc)J + c~y r (k) - p r (k) - v r (k) J C 



v=l \ r'eAf-r 

+ |(l + 2|^ r |)||Cr||l J- 

Upon dividing by c(l + 2|7V r |) each subproblem becomes identical to problem (19), and thus by Proposition 
2 takes the form of (26). 
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Fig. 1. Expansion with overlapping raised cosine pulses. 
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Fig. 2. (left) Position of sources and obstructing wall; (right) PSD generated by the active transmitters. 
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Fig. 3. (left) Aggregate map estimate in dB; (right) error evolution of the GLasso updates 
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Fig. 4. (left) Frequency bases selected by the group-Lassoed spline-based estimator; (right) and by ridge regression. 
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Fig. 5. (left) Minimization of the CV error over A; (right) and over fj,. 
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